Search CORE

34 research outputs found

Distributed Dictionary Learning

Author: Daneshmand Amir
Facchinei Francisco
Scutari Gesualdo
Publication venue
Publication date: 21/12/2016
Field of study

The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization

Author: Daneshmand Amir
Facchinei Francisco
Kungurtsev Vyacheslav
Scutari Gesualdo
Publication venue
Publication date: 02/09/2014
Field of study

We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a \emph{mixed random and deterministic} procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm outperforms both random and deterministic schemes.Comment: The order of the authors is alphabetica

arXiv.org e-Print Archive

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza

On the impact of activation and normalization in obtaining isometric embeddings at initialization

Author: Bach Francis
Daneshmand Hadi
Joudaki Amir
Publication venue
Publication date: 28/05/2023
Field of study

In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order (

\ge 2

) Hermite coefficients in the bias towards isometry

arXiv.org e-Print Archive

Batch Normalization Orthogonalizes Representations in Deep Random Networks

Author: Bach Francis
Daneshmand Hadi
Joudaki Amir
Publication venue
Publication date: 07/06/2021
Field of study

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Decentralized Dictionary Learning Over Time-Varying Digraphs

Author: Daneshmand Amir
Facchinei Francisco
Sadler Brian M.
Scutari Gesualdo
Sun Ying
Publication venue
Publication date: 01/01/2019
Field of study

This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Residual Energy Based Cluster-head Selection in WSNs for IoT Application

Author: Behera Trupti Mayee
Daneshmand Mahmoud
Gandomi Amir H.
Khan Mohammad. S.
Mohapatra Sushanta Kumar
Samal Umesh Chandra
Publication venue
Publication date: 04/02/2019
Field of study

Wireless sensor networks (WSN) groups specialized transducers that provide sensing services to Internet of Things (IoT) devices with limited energy and storage resources. Since replacement or recharging of batteries in sensor nodes is almost impossible, power consumption becomes one of the crucial design issues in WSN. Clustering algorithm plays an important role in power conservation for the energy constrained network. Choosing a cluster head can appropriately balance the load in the network thereby reducing energy consumption and enhancing lifetime. The paper focuses on an efficient cluster head election scheme that rotates the cluster head position among the nodes with higher energy level as compared to other. The algorithm considers initial energy, residual energy and an optimum value of cluster heads to elect the next group of cluster heads for the network that suits for IoT applications such as environmental monitoring, smart cities, and systems. Simulation analysis shows the modified version performs better than the LEACH protocol by enhancing the throughput by 60%, lifetime by 66%, and residual energy by 64%

arXiv.org e-Print Archive

East Tennessee State University

Communication Ttechnologies for edge learning and inference: a novel framework, open issues, and perspectives

Author: Daneshmand Mahmoud
de Albuquerque Victor
Del Ser Javier
Fonseca Ramon
Gandomi Amir
Hussain Tanveer
Magaia Naercio
Muhammad Khan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2022
Field of study

With the continuous advancement of smart devices and their demand for data, the complex computation that was previously exclusive to the cloud server is now moving towards the edge of the network. Due to numerous reasons (e.g., applications demanding low latencies and data privacy), data-based computation has been brought closer to the originating source, forging the Edge Computing paradigm. Together with Machine Learning, Edge Computing has turned into a powerful local decision-making tool, thus fostering the advent of Edge Learning. The latter, however, has become delay-sensitive as well as resource-thirsty in terms of hardware and networking. New methods have been developed to solve or, at least, minimize these issues, as proposed in this research. In this study, we first investigate representative communication methods for edge learning and inference (ELI), focusing on data compression, latency, and resource management. Next, we propose an ELI-based video data prioritization framework which only considers the data having events and hence significantly reduces the transmission and storage resources when implemented in surveillance networks. Furthermore, in this overview, we critically examine various communication aspects related to Edge Learning by analyzing their issues and highlighting their advantages and disadvantages. Finally, we discuss challenges and present issues that are yet to be overcome

Edge Hill University Research Information Repository

Sussex Research Online